Goto

Collaborating Authors

 Choiseul


NOTAM-Evolve: A Knowledge-Guided Self-Evolving Optimization Framework with LLMs for NOTAM Interpretation

Liu, Maoqi, Fang, Quan, Wu, Yuhao, Zhao, Can, Yang, Yang, Cai, Kaiquan

arXiv.org Artificial Intelligence

Accurate interpretation of Notices to Airmen (NOTAMs) is critical for aviation safety, yet their condensed and cryptic language poses significant challenges to both manual and automated processing. Existing automated systems are typically limited to shallow parsing, failing to extract the actionable intelligence needed for operational decisions. We formalize the complete interpretation task as deep parsing, a dual-reasoning challenge requiring both dynamic knowledge grounding (linking the NOTAM to evolving real-world aeronautical data) and schema-based inference (applying static domain rules to deduce operational status). To tackle this challenge, we propose NOTAM-Evolve, a self-evolving framework that enables a large language model (LLM) to autonomously master complex NOTAM interpretation. Leveraging a knowledge graph-enhanced retrieval module for data grounding, the framework introduces a closed-loop learning process where the LLM progressively improves from its own outputs, minimizing the need for extensive human-annotated reasoning traces. In conjunction with this framework, we introduce a new benchmark dataset of 10,000 expert-annotated NOTAMs. Our experiments demonstrate that NOTAM-Evolve achieves a 30.4% absolute accuracy improvement over the base LLM, establishing a new state of the art on the task of structured NOTAM interpretation.


ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval

Abdallah, Abdelrahman, Mozafari, Jamshid, Piryani, Bhawna, Jatowt, Adam

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) models have drawn considerable attention in modern open-domain question answering. The effectiveness of RAG depends on the quality of the top retrieved documents. However, conventional retrieval methods sometimes fail to rank the most relevant documents at the top. In this paper, we introduce ASRank, a new re-ranking method based on scoring retrieved documents using zero-shot answer scent which relies on a pre-trained large language model to compute the likelihood of the document-derived answers aligning with the answer scent. Our approach demonstrates marked improvements across several datasets, including NQ, TriviaQA, WebQA, ArchivalQA, HotpotQA, and Entity Questions. Notably, ASRank increases Top-1 retrieval accuracy on NQ from $19.2\%$ to $46.5\%$ for MSS and $22.1\%$ to $47.3\%$ for BM25. It also shows strong retrieval performance on several datasets compared to state-of-the-art methods (47.3 Top-1 by ASRank vs 35.4 by UPR by BM25).


Surrealistic-like Image Generation with Vision-Language Models

Ayten, Elif, Wang, Shuai, Snoep, Hjalmar

arXiv.org Artificial Intelligence

Recent advances in generative AI make it convenient to create different types of content, including text, images, and code. In this paper, we explore the generation of images in the style of paintings in the surrealism movement using vision-language generative models, including DALL-E, Deep Dream Generator, and DreamStudio. Our investigation starts with the generation of images under various image generation settings and different models. The primary objective is to identify the most suitable model and settings for producing such images. Additionally, we aim to understand the impact of using edited base images on the generated resulting images. Through these experiments, we evaluate the performance of selected models and gain valuable insights into their capabilities in generating such images. Our analysis shows that Dall-E 2 performs the best when using the generated prompt by ChatGPT.


Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

Truong, Sang T., Nguyen, Duc Q., Nguyen, Toan, Le, Dong D., Truong, Nhi N., Quan, Tho, Koyejo, Sanmi

arXiv.org Artificial Intelligence

We employ Large language models (LLMs) such as GPT-fine-tuning on the LLaMa-2, Mixtral 8 7B, 4 (OpenAI, 2023), BLOOM (Le Scao et al, Gemma, and conduct a comprehensive evaluation 2023), LLaMa-2 (Touvron et al, 2023), Mistral of Vietnamese LLMs across various scenarios and (Jiang et al., 2023), Mixtral (Jiang et al., 2024), settings. Throughout the thorough evaluation process, Gemma (Team et al., 2024) have made significant we observe the following: (i) larger language contributions to the field of natural language processing models exhibit unseen capabilities compared to (NLP). Despite their advancements, a gap smaller counterparts; (ii) larger language models remains in their specialization for many languages, tend to manifest more biases, produce uncalibrated including Vietnamese. This paper addresses the results, and are more susceptible to the influence development and evaluation of Vietnamese-centric of input prompts; (iii) the quality of training or LLMs. Vietnam, with a population surpassing 100 fine-tuning datasets is the key for unlocking LLM million, ranks as the 16th most populous country performance. Our key contributions include: globally.